Script for Protein Sequence Determination

PRELIMINARY

The basic strategy for sequencing a protein is to break the protein into fragments small enough to be sequenced directly. These sequences are spliced together by comparing the overlapping regions of the fragments. In this exploration, you will be guided through each step using the protein ribonuclease A as an example.

Denature Protein

Before fragmenting the protein, it must be denatured; that is, it must be unfolded from its compact globular shape into an extended linear form to make it more accessible to attack by chemical and enzymatic reagents. Here, each amino acid in the protein is drawn as a single sphere. Heating the protein or adding substances such as urea will denature most proteins. However, the resulting unfolded chain is not yet ready for analysis. Many proteins have disulfide cross-links between cysteine side chains, drawn here as yellow spheres. These must be cleaved to facilitate the sequence determination.

Reduce Disulfides

A common way to cleave disulfide bonds is to add the reducing agent 2-mercaptoethanol. Under suitable conditions, two of these mercaptoethanol molecules react to cleave a disulfide link in a polypeptide by forming a disulfide bond between themselves. Effectively, the disulfide bond is transferred from the polypeptide to the mercaptoethanols. The newly liberated sulfhydryl groups must then be chemically protected to prevent the reformation of disulfide bonds through air oxidation. This is often done by treatment with iodoacetic acid, an alkylating agent that reacts with cysteine residues to form S-carboxymethylcysteine, a compound that is unreactive under the conditions to which the polypeptide will subsequently be exposed.

Take a moment to examine the fully denatured protein. Convince yourself that it is indeed a linear chain from one end to the other. When you are ready to continue, click on the NEXT button.

Determine Subunits

At this point, we must determine how many different types of subunits the protein we are dealing with contains. Since each peptide chain has only two ends, identifying how many unique terminal groups there are will tell us how many nonidentical subunits we are dealing with. In a common technique, the N-terminal amino group is reacted with with dansyl chloride. The resulting dansyl polypeptide is then hydrolyzed under acidic conditions so as to completely degrade it to its component amino acids with its N-terminal residue still a highly fluorescent dansylamino acid. Here the purple balls represent amino acids and the dansyl group is represented by a yellow ball that is covalently linked to the N-terminal residue.

The protein has now been converted to a mixture of single amino acids, one or more of which are labeled with the dansyl group. The next step is to separate out the highly fluorescent dansyl-labeled residues and identify the amino acids to which they attached. This can be done by a variety of chromatographic techniques. Here we use an HPLC apparatus set up specifically for amino acid separation and identification. The sample is injected and the dansyl-labeled residue (in yellow) separates from the others as it passes through the column. Its relative elution volume depends precisely on the amino acid to which it is attached, thereby identifying the N-terminal residue. Here we detect only one dansylamino acid, indicating that our protein consists of only one type of polypeptide chain. More than one dansyl-labeled peak would have indicated that the protein consists of more than one type of polypeptide. These subunits would have to be separated from one another, usually by chromatographic means, before the subsequent steps of the analysis could be carried out. Of course, there is a slight chance that two different polypeptides have the same N-terminal amino acid, in which case the conclusion that there is only one polypeptide chain would be erroneous.

 

ANALYZE SEGMENTS

Cleave into Segments

Polypeptide sequences longer than 40 to 100 residues cannot be directly sequenced and therefore polypeptides longer than this must be cleaved into manageable-sized fragments.

Certain enzymes cleave peptide bonds with great specificity. Trypsin, for example, will only hydrolyze peptide bonds after lysine or arginine residues if the preceeding residue is not proline. Here, the lysines are drawn in green and the arginines in blue. As trypsin locates the various lysine and arginine side chains, it hydrolyzes the following peptide bond, releasing fragments from the polypeptide. These fragments have different lengths, but because trypsin cleaves all copies of the protein the same way, there are only a small number of unique fragments. In the case of ribonuclease A, trypsin digestion generates a total of 14 fragments.

The mixture of fragments must now be separated. An HPLC device will again do the trick.

The sample containing the trypsin-cleaved polypeptide fragments, here represented by purple balls, is injected, and the fragments appear as bands in the effluent. As each fragment emerges from the column, it is saved for sequencing in the next step.

Edman Degradation

The amino acid sequence of each fragment can now be determined through repeated cycles of Edman degradation.

Let's examine one iteration of this process. In the first step of the Edman degradation, phenylisothiocyanate or PITC specifically reacts with the N-terminal amino acid residue of the polypeptide fragment to yield a PTC polypeptide. Following this, the unreacted PITC is washed away and the PTC polypeptide is exposed to anhydrous trifluoroacetic acid. This induces a reaction that cleaves the N-terminal peptide bond, while causing the N-terminal residue to cyclize to form a thiazolinone derivative. Thus, the original polypeptide has lost its N-terminal residue but is otherwise unchanged. The thiazolinone derivative is then extracted into an organic solvent and, under acidic conditions, is converted to the more stable PTH derivative, which can then be chromatographically identified. The remaining polypeptide can then be subjected to further rounds of Edman degradation until the cumulative effects of incomplete reactions, side reactions, and peptide loss make further amino acid identification unreliable.

The Edman degradation has been automated and refined so that it is now possible to sequence up to 100 residues in samples containing less than a tenth of a microgram of polypeptide. Repeating the analysis for each of the 14 polypeptide fragments in our trypsin digest of ribonuclease A yields their sequences.

 

RECONSTRUCT SEQUENCE

Align Sequences

Now that we know the sequences for each segment, we need to piece them back together in the proper order to reconstruct the complete polypeptide chain. We can start by identifying its C-terminal fragment. First, remember that, by custom, we always write out peptide sequences with the N-terminus on the left and the C-terminus on the right. Trypsin digestion always generates fragments with C-terminal lysines or arginines. If any segment lacks lysine or arginine at its C-terminal end, it must be the C-terminus of the original polypeptide. In our case, only segment 12, which has a C-terminal valine, can be the C-terminal sequence.

To continue, we need additional information that tells us which polypeptide fragment comes next. This is accomplished through a second round of polypeptide cleavage with a reagent of different specificity that generates sequences that can be overlapped with those of the trypsin digest. Cyanogen bromide specifically breaks the peptide bonds after methionine residues, which for ribonuclease A would result in these five fragments. Again, the C-terminal fragment is the one that doesn't end in methionine, or fragment 4. Since it is a longer sequence than fragment 12 of the trypsin digest, we will use it as our initial sequence.

Now, let's look for a sequence in the trypsin digest whose C-terminal end matches the N-terminal end of our initial sequence. We expect a match to be possible up to and including any lysine or arginine in our template. The target match is highlighted in red. Looking at all the trypsin fragments, only fragment 11 matches and so will be our next sequence. Moving left again to the next boundary, we find that the next cyanogen bromide sequence must be a C-terminal end of SYSTM, which is only true for fragment 5.

The sequence for fragment 5 is thus added and we move again to the new left boundary. As it turns out, this next overlap is going to be tricky. Since the cyanogen bromide fragment starts with a lysine residue, the trypsin fragment will only have one residue in common, lysine, and many of these fragments end in lysine. To narrow down our choices, it is helpful to eliminate all the trypsin fragments that are accounted for in the sequence assigned so far. Then, remember that for cyanogen bromide to have cleaved this fragment here, there must have been a methionine residue preceeding the lysine residue. Only trypsin fragment 13 ends in a methionine-lysine, M-K, so it must be the one that comes next.

The final sequence from the cyanogen bromide fragments must have a C-terminal end of QHM to align with the trypsin fragment. Fragment 2 then completes the full polypeptide sequence.

Assign Disulfides

The final step in the amino acid sequence determination process is to identify the positions of the disulfide bonds. This process is begun by fragmenting the polypetide before reducing and alkylating its disulfide bonds.

Consider the trypsin digest of ribonuclease A. The eight cysteine residues occur on six of its polypeptide fragments. For the purpose of identifying the disulfide bridges, the other fragments can be ignored. Digesting the disulfide-intact protein with trypsin will yield different fragments depending on which are linked to which. In the case of ribonuclease A, our original six cysteine-containing fragments form two larger disulfide-containing fragments. These can be chromatographically separated into two fractions. The disulfide assignment is simplified, but is still not unambiguous. There are still two possibilities for each case. Let's consider one of these. If we now use cyanogen bromide to cleave this polypeptide, again BEFORE reducing and alkylating the disulfide bonds, we obtain two fractions, each with only a pair of cysteines. There is only one possibility for this case. The identities of the disulfide-linked fragments can then be determined by cleaving the disulfide bond between them and sequencing the fragments.

In a similar fashion, which cysteine residues are disulfide-linked to which can be resolved for the entire polypeptide. The sequence analysis is complete.